31 research outputs found

    The DeepZen Speech Synthesis System for Blizzard Challenge 2023

    Full text link
    This paper describes the DeepZen text to speech (TTS) system for Blizzard Challenge 2023. The goal of this challenge is to synthesise natural and high-quality speech in French, from a large monospeaker dataset (hub task) and from a smaller dataset by speaker adaptation (spoke task). We participated to both tasks with the same model architecture. Our approach has been to use an auto-regressive model, which retains an advantage for generating natural sounding speech but to improve prosodic control in several ways. Similarly to non-attentive Tacotron, the model uses a duration predictor and gaussian upsampling at inference, but with a simpler unsupervised training. We also model the speaking style at both sentence and word levels by extracting global and local style tokens from the reference speech. At inference, the global and local style tokens are predicted from a BERT model run on text. This BERT model is also used to predict specific pronunciation features like schwa elision and optional liaisons. Finally, a modified version of HifiGAN trained on a large public dataset and fine-tuned on the target voices is used to generate speech waveform. Our team is identified as O in the the Blizzard evaluation and MUSHRA test results show that our system performs second ex aequo in both hub task (median score of 0.75) and spoke task (median score of 0.68), over 18 and 14 participants, respectively.Comment: Blizzard Challenge 202

    An Excitation Model for HMM-Based Speech Synthesis Based on Residual Modeling

    Get PDF
    SSW6: 6th ISCA Speech Synthesis Workshop, August 22-24, 2007, Bonn, Germany.This paper describes a trainable excitation approach to eliminate the unnaturalness of HMM-based speech synthesizers. During the waveform generation part, mixed excitation is constructed by state-dependent filtering of pulse trains and white noise sequences. In the training part, filters and pulse trains are jointly optimized through a procedure which resembles analysis-bysynthesis speech coding algorithms, where likelihood maximization of residual signals (derived from the same database which is used to train the HMM-based synthesizer) is pursued. Preliminary results show that the novel excitation model in question eliminates the unnaturalness of synthesized speech, being comparable in quality to the the best approaches thus far reported to eradicate the buzziness of HMM-based synthesizers

    A fixed dimension and perceptually based dynamic sinusoidal model of speech

    Get PDF
    This paper presents a fixed- and low-dimensional, perceptually based dynamic sinusoidal model of speech referred to as PDM (Perceptual Dynamic Model). To decrease and fix the number of sinusoidal components typically used in the standard sinusoidal model, we propose to use only one dynamic sinusoidal component per critical band. For each band, the sinusoid with the maximum spectral amplitude is selected and associated with the centre frequency of that critical band. The model is expanded at low frequencies by incorporating sinusoids at the boundaries of the corresponding bands while at the higher frequencies a modulated noise component is used. A listening test is conducted to compare speech reconstructed with PDM and state-of-the-art models of speech, where all models are constrained to use an equal number of parameters. The results show that PDM is clearly preferred in terms of quality over the other systems. Index Terms — Sinusoidal Model, Critical band, Vocoder 1

    LoTuS: uma Ferramenta Gráfica Extensível para Modelagem, Análise e Verificação de Modelos LTS e PLTS

    Get PDF
    Este artigo apresenta LoTuS, uma ferramenta para modelagem gráfica, análise e verificação de comportamento de software usando LTS e PLTS. Suas principais contribuições são: facilitar o processo de modelagem formal através de um mecanismo de drag and drop que permite criar tanto modelos não probabilísticos como probabilísticos; permitir a geração de modelos a partir de outras fontes, como diagramas de sequencia da UML ou rastros de execução; prover um conjunto de técnicas de análise de modelos, como simulação, execução, detecção de deadlock e verificação probabilísticas de propriedades de alcançabilidade; e por fim, fornecer uma API para que desenvolvedores possam adicionar novas funcionalidades através da criação de plugins. A ferramenta foi avaliada em termos de sua usabilidade e desempenho e através de um estudo de caso no qual suas principais funcionalidades foram exercitadas
    corecore